Feature descriptor

My feature descriptor first extracts a 16x16 oriented patch around the detected interest point. It then splits the 16x16 patch into 16 4x4 patches. It calculates the gradient directions and magnitudes in each of these patches, and throws out pixels with gradient magnitudes that are too small. It then constructs a fuzzy histogram of the surviving gradient directions. This creates a 16*8=128 dimension feature vector. This vector is then normalized by subtracting its mean and dividing by its standard deviation.

Design choices

I used a rotated patch to make the descriptor invariant to rotations in the image. I use a 16x16 patch size as it would allow me to capture some local feature about the interest point, but it would not be too big to include irrelevant information. Having a single histogram would cause the information in the patches to be averaged out, so I defined 16 different 4x4 patches in order to retain the information of different patches separately. As we learned from class, edge directions tend not to change, and edge directions correspond well to gradient directions. I threw out weak edges by first applying a high pass threshold to the gradient magnitudes. The problem with traditional histograms is that small changes in values around bin boundaries can lead to big changes in the histogram produced. Since edge directions are quite sensitive to noise, we would like the histogram to be less prone to errors due to noise, so we use a fuzzy histogram by defining 8 directions [0 – 2]. Each detected edge direction must lie somewhere between two adjacent bins, and suppose its distance to bin 1 is and the distance to bin 2 is , then it contributes to bin 1 and to bin 2. This way, small changes in detected edge directions produce commensurate changes in the histogram. The last normalization is to make the descriptor invariant to changes in ambient lighting conditions.

Performance

ROC and AUC


ROC on Graf images	ROC on Yosemite images

Method	Graf AUC	Yosemite AUC
Simple descriptor + SSD	0.625210	0.897799
Simple descriptor + ratio test	0.686555	0.803459
MOPS + SSD	0.768227	0.867834
MOPS + ratio test	0.792365	0.843684
Custom descriptor + SSD	0.914804	0.891254
Custom descriptor + ratio test	0.754935	0.920598

Harris operator images


Harris image for Graf image 1	Harris image for Yosemite image 1

Average AUC

Test set	5x5 window descriptor		MOPS descriptor		Custom descriptor
Test set	SSD	Ratio test	SSD	Ratio test	SSD	Ratio test
Graf	0.474497	0.533717	0.586486	0.559582	0.582125	0.564281
Leuven	0.511819	0.489997	0.677964	0.688354	0.676296	0.653333
Bikes	0.581818	0.564935	0.840260	0.583766	0.597403	0.500000
Wall	0.520478	0.596005	0.622346	0.649891	0.708243	0.675346

Strengths and weaknesses

From Yosemite test case, the 5x5 simple descriptor seems to do very well as the images are (more or less) simply translated and not rotated. Hence, the 5x5 descriptor, which does not take rotation into account, does well as it assumes that the pictures are not rotated. MOPS descriptor did not do as well for this test case as the detected orientation of the interest point might be distorted by noise, thereby producing different 40x40 patches in both images, thus not matching well. The custom descriptor seems to do slightly better as the fuzzy histograms are somewhat less prone to errors in detected orientations.

However, when rotation (perspective warp) gets featured in the graf test cases, MOPS and the custom descriptor showed marked improvement over the 5x5 simple descriptor as they both take rotation into account.

Since the MOPS and custom descriptor are both normalized, they performed well for the leuven benchmark, proving to be less prone to lighting changes.

The Harris corner detector was not done in scale-space, and was thus sensitive to scaling. In the bikes benchmark, the images were increasingly blurred, and running the Harris detector at different scales would result in better interest point detection. All of the methods depended on the Harris detector, and hence did not perform well for this case, and the case with MOPS + SSD is likely to be an anomaly.

In the wall benchmark, we can see that the custom descriptor performed best as it uses edge directions instead of pixel values, and these edge directions changes less with perspective changes, and this explains the increased performance.

Overall, the 5x5 simple descriptor does well at detecting translations when no other distortions are present. The MOPS and custom descriptors handle changes due to rotation and ambient lighting well, but the custom descriptor does better to handle perspective warp. All descriptors did not do well for blurred images since they all depend on the Harris detector which should have been run at different scales to detect the canonical scale. With the detected scale, we can run the descriptors at the canonical scale to achieve scale invariance.

More images

Here are some more test images from my own collection.

Custom descriptor + SSD on translated image

Custom descriptor + SSD on rotated image

Custom descriptor + SSD on scaled image

As we can see, the descriptor is invariant to translation and rotation, but sensitive to scaling. Feature detection and descriptors in scale-space would alleviate this problem.